Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: validate proposed change(s) in triagebot.toml for PRs ✨ #1730

Merged
merged 3 commits into from
Jan 22, 2024

Conversation

meysam81
Copy link
Contributor

@meysam81 meysam81 commented Oct 8, 2023

fixes rust-lang/rust#106104

A successful run is visible here: meysam81/triagebot-test#2 (comment)

Some highlights I need to mention here:

  • I can't be sure if the proposed change is placed in the right spot, but it works! 🤷
  • If it's not too much to ask, would you please add hacktoberfest-accepted to this PR, since the repo is not marked with hacktoberfest.
  • One question of mine is: why is cargo fmt, nor cargo clippy not formatting my code? 🤔
  • How can I write test for this? I couldn't figure this part out!

@meysam81
Copy link
Contributor Author

meysam81 commented Oct 8, 2023

r?

@rustbot
Copy link
Collaborator

rustbot commented Oct 8, 2023

Error: Parsing assign command in comment failed: ...'' | error: specify user to assign to at >| ''...

Please file an issue on GitHub at triagebot if there's a problem with this bot, or reach out on #t-infra on Zulip.

@meysam81
Copy link
Contributor Author

meysam81 commented Oct 8, 2023

r? dev-tools

src/config.rs Outdated
Comment on lines 18 to 21
pub fn config_file_name() -> &'static str {
CONFIG_FILE_NAME
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this utility function? Can CONFIG_FILE_NAME be made public and used directly in handler.rs?

src/handlers.rs Outdated
@@ -124,6 +124,29 @@ pub async fn handle(ctx: &Context, event: &Event) -> Vec<HandlerError> {
errors
}

async fn validate_triagebot(ctx: &Context, event: &IssuesEvent, errors: &mut Vec<HandlerError>) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see this function would inject an HTTP call in the issue_handlers macro which is called quite often. I am a bit worried about the additional overhead on every issue the triagebot is called to handle.

What is your goal here @meysam81 ? validate the triagebot config for a certain repository, right? Can this check happen just once on triagebot startup?

Also the name of this function could be a bit more explicit about its purpose (e.g. validate_triagebot_config or something similar).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should be placed in its own module at minimum (similar to all the other handlers). I'd emulate something like rustc_commits in terms of structure.

I would expect that if the goal is to validate PRs, then we would do this only on push events. It probably also makes sense to delegate the actual validation to the config module, not just calling toml::from_slice here.

@meysam81 meysam81 changed the title feat: valdiate proposed change in triagetoml for PRs ✨ feat: valdiate proposed change(s) in triage.toml for PRs ✨ Oct 14, 2023
@meysam81
Copy link
Contributor Author

@Mark-Simulacrum

As you suggested, I refactored the changes into its own module.

You may also see a successful run here: meysam81/triagebot-test#2 (comment)

@meysam81 meysam81 changed the title feat: valdiate proposed change(s) in triage.toml for PRs ✨ feat: valdiate proposed change(s) in triagebot.toml for PRs ✨ Oct 14, 2023
Comment on lines 35 to 43
let diff = match event.issue.diff(&ctx.github).await {
Ok(Some(diff)) => diff,
wildcard => {
log::error!("Failed to fetch diff: {:?}", wildcard);
return Ok(None);
}
};

if diff.contains(crate::config::CONFIG_FILE_NAME) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few concerns about this. IIUC, this fetches the full diff on every PR on every push. That seems like it would be expensive, and potentially slow.

Would it be possible to use the files() method instead? I would imagine that is more efficient, and is also structured.

Similarly, this looks like it does a string contains check, which means that if any text anywhere in the patch contains the bytes triagebot.toml, it would trigger. If it looked at the structured file list, then it should be able to look to see if that file is specifically being modified.

I'm still not sure if hitting the files endpoint on every push is OK, but I suspect the traffic should be low enough to be fine.

@ehuss
Copy link
Contributor

ehuss commented Oct 17, 2023

BTW, there is now a merge conflict.

@meysam81 meysam81 marked this pull request as draft October 23, 2023 06:46
@meysam81 meysam81 marked this pull request as ready for review November 12, 2023 13:13
@ehuss
Copy link
Contributor

ehuss commented Nov 13, 2023

I'm a bit confused about the approach taken here. It seems to validate the config on every PR opened or pushed. I would expect it to operate like it did before, where it would only validate the config on a PR that is modifying triagebot.toml. Can you say more about why that is changing? I would also not expect there to be any need to do any caching if it was only checking in the case where triagebot.toml is added or modified.

return triagebot_toml.clone();
}

log::warn!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think doing a warn! on every cache miss would pollute the logs with unactionable information. I would expect something like this to be trace or debug. Similarly with the info! above about a hit, that information is likely to add noise.

@meysam81
Copy link
Contributor Author

I'm a bit confused about the approach taken here. It seems to validate the config on every PR opened or pushed. I would expect it to operate like it did before, where it would only validate the config on a PR that is modifying triagebot.toml. Can you say more about why that is changing? I would also not expect there to be any need to do any caching if it was only checking in the case where triagebot.toml is added or modified.

The main driver for the new approach was to address the expensive HTTP round-trips mentioned here: #1730 (comment)

Additionally, at the expense of using a fixed size memory for holding the config (128 total triagebot.toml ATM), we achieve a speed up in case no new commits has been pushed and only a synchronization is triggered.

src/github.rs Outdated
pub struct PullRequestFile {
pub sha: String,
pub filename: String,
pub blob_url: String,
pub raw_url: String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are raw_url and Clone still needed?

@ehuss
Copy link
Contributor

ehuss commented Nov 16, 2023

The main driver for the new approach was to address the expensive HTTP round-trips mentioned here: #1730 (comment)

My concern in that comment was about using the diff endpoint which fetches the entire contents of every PR. Would it be possible to use the files() endpoint instead to check if the PR touches triagebot.toml, and only download the file in that case? I don't think that approach should need any sort of caching.

In the future, we could potentially cache the files() output, since I think there are several other handlers that need to check if a PR touches certain files. But that cache would only need to live as long as the loop through the handlers, and then can be immediately discarded. But I think that is something we could consider later, not something I would bother with in this PR.

@ehuss ehuss changed the title feat: valdiate proposed change(s) in triagebot.toml for PRs ✨ feat: validate proposed change(s) in triagebot.toml for PRs ✨ Nov 16, 2023
@ehuss
Copy link
Contributor

ehuss commented Nov 16, 2023

Hm, so I'm taking a closer look at this, and I need the diff for something else. We're already fetching the diff 3 times in other handlers. I'll try to follow up soon with something that caches that.

@ehuss
Copy link
Contributor

ehuss commented Nov 17, 2023

Ok, I posted #1749 to implement caching of the diff (since I need it for something else). If/when that PR is merged, I think this PR can be updated to just call the diff() function to determine if it contains triagebot.toml, and if it does, fetch the contents and validate it (much like it did before 287b414).

Sorry about the back and forth on this.

This adds deny_unknown_fields to Config so that any typo mistakes
will be caught by the config validation.
@ehuss
Copy link
Contributor

ehuss commented Jan 22, 2024

I hope you don't mind, I went ahead and applied some of the changes suggested above. I also made a few other changes:

  • Updated toml to the latest version to get better error messages.
  • Added deny_unknown_fields to show typo errors to the user.
  • Switch to using SHA for the file download instead of the git ref. The git ref has issues with caching, and can take several minutes to update. This also uses the pre-existing code for fetching raw files.

Thanks for helping with this!

@ehuss ehuss merged commit 7a497a0 into rust-lang:master Jan 22, 2024
2 checks passed
@apiraino
Copy link
Contributor

Added deny_unknown_fields to show typo errors to the user.

@ehuss IIUC I think commit 252acb2 caused unexpected side-effects (see comment).

It will report errors also on PRs adding new config options to triagebot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Validate triagebot.toml in tidy/CI
5 participants